The Foundations of Geospatial¶

Jorge Sanz¶

Madrid | 2023-05-31 | https://github.com/jsanz/sdsc/tree/madrid

image

About me¶

Education:

  • 1997-2001 | Surveying Engineering (UPV)
  • 2001-2004 | Cartography and Geodesy Engineering (UPV)

Work:

  • 2004-2006 Polytechnic Univ. València | Researcher and lecturer
  • 2006-2015 Prodevelop | GIS Consultant
  • 2015-2019 CARTO | Solutions Engineer and Support Manager
  • 2019-now Elastic | Principal Software Engineer

Socials:

  • 🌐 https://jorgesanz.net
  • 👔 http://www.linkedin.com/in/jsanz
  • 👨‍💻 https://github.com/jsanz
  • 🐘 https://mapstodon.space/@jorgesanz

cibervoluntarios.org¶

About these materials¶

  • Originally developed by Danny Sheehan
  • Checkout the madrid branch from https://github.com/jsanz/sdsc
  • With docker, build and run the repository, or make docker/run
  • With pip (Python 3.10 or later)
    • create a new virtual environment + pip install -r requirements.txt + jupyter notebook
    • or make python/run
  • With miniconda, follow Danny's instructions at https://github.com/nygeog/sdsc
  • Check the README.md for more details
In [1]:
import warnings
warnings.filterwarnings('ignore')

import pandas as pd
import geopandas as gpd
from shapely import wkt
import matplotlib.pyplot as plt
import rasterio
from rasterstats import zonal_stats
import numpy as np
import osmnx as ox

from IPython.display import IFrame

%matplotlib inline
In [2]:
import shapely
shapely.__version__
Out[2]:
'2.0.1'

🔵 Agenda¶

  • Geodesy is fun
  • Geospatial Data Models
  • shapely: Basic vector data management
  • pandas: Data Frames
  • geopandas: Geospatial Data Frames, import, overlays, etc
  • rasterio: Basic raster data management
  • Other topics: routing, geocoding, ...

🔵 Geodesy is fun¶

Based on Geospatial analysis with Python, Valencia, PyConES 2015 - 2015/11/22

fun with flags

🗺 Maps are flat, Earth is not 🌏¶

Earth is not a Blue Marble 🌎¶

Earth is more like a 🥔¶

So we use an approximation 🤔¶

So we use an approximation 🤔¶

Datums¶

  • A Datum is a set of facts that creates a reference system for Earth locations
  • It is made of an ellipsoid definition, tied to location to pin it to the Earth
  • They can aim to be used as a global reference (ex: WGS84, ETRS89) or local (ex: ED50, NAD83)
  • They can be horizontal or vertical

https://en.wikipedia.org/wiki/Geodetic_datum

Map Projections¶

  • Can't turn flat something spherical (or ellipsoidal), yeah like that 🍊
  • You need to choose a property to keep:
    • Angles (conformal): Mercator or Stereographic
    • Areas (authalic): Albers, Mollweide
    • Distances (equidistant): Werner
    • Compromise projections: Robinson, Winkel Tripel, Equal Earth
  • Some resourcces: map-projections, geo-projections, projectionwizard, myriahedral

Relevant xkcd: https://xkcd.com/977

Spatial Reference Systems and Coordinate Reference Systems¶

Toghether a DATUM and a PROJECTION make a CRS

The most famous catalog of CRS is EPSG

  • EPSG:4326 : WGS 84 + lat/lon
  • EPSG:4258 : ETRS89 + lat/lon
  • EPSG:25830 : ETRS89 + UTM Zone 30
  • EPSG:3857 : Mercator on a Sphere

https://epsg.org | https://spatialreference.org | https://epsg.io

Searching for results intersecting Retiro Park coordinates:

  • Datums: 78
  • Coordinate Reference Systems: 209
  • Conversions: 24

🔵 Geospatial Data Models¶

A data model is a way of defining and representing real world surfaces and characteristics in GIS.

There are two primary types of spatial data models: Vector and Raster.

Source: http://gsp.humboldt.edu/OLM/Courses/GSP_216_Online/lesson3-1/data-models.html

Vector¶

Vector data represents features as discrete points, lines, and polygons

Raster¶

Raster data represents features as a rectangular matrix of square cells (pixels)

image.png

Source: http://www.geo.umass.edu/courses/geo494a/Chapter2_GIS_Fundamentals.pdf

TIN Data model¶

A triangulated irregular network (TIN)

is a data model commonly used to represent terrain heights - Source: http://www.geo.umass.edu/courses/geo494a/Chapter2_GIS_Fundamentals.pdf

Typical use cases are for terrain inputs in Hydrologic Modeling / Watershed Modeling.

tin

Source: http://www.geo.umass.edu/courses/geo494a/Chapter2_GIS_Fundamentals.pdf

Vector Data Model¶

Vector data is very common, and is often used to represent features like roads and boundaries. Vector data comes in the form of points and lines that are geometrically and mathematically associated.

  • Points:
* One pair of coordinates defines the location of a point feature.

  • Polylines (LineStrings):
* Two or more pairs of coordinates that are connected define a line feature.
* A series of connected points

  • Polygons:
* Multiple pairs of coordinates that are connected and closed define a polygon feature.
* A series of connected points that loop back to the first point
  * Multiple "polygons" can exist in one layer
  * Polygons can have internal polygons or "holes"
  * The beginning and ending coordinates for a polygon are the same.

Source: http://gsp.humboldt.edu/OLM/Courses/GSP_216_Online/lesson3-1/data-models.html

https://www.ogc.org/standard/sfa/

🔵 Shapely¶

Shapely is a Python package for set-theoretic analysis and manipulation of planar features using (via Python’s ctypes module) functions from the well known and widely deployed GEOS library. GEOS, a port of the Java Topology Suite (JTS), is the geometry engine of the PostGIS spatial extension for the PostgreSQL RDBMS. The designs of JTS and GEOS are largely guided by the Open Geospatial Consortium’s Simple Features Access Specification 1 and Shapely adheres mainly to the same set of standard classes and operations. Shapely is thereby deeply rooted in the conventions of the geographic information systems (GIS) world, but aspires to be equally useful to programmers working on non-conventional problems.

Source: https://shapely.readthedocs.io/en/latest/manual.html

Vector¶

Points¶

In [3]:
from shapely.geometry import Point

point = Point(
    0.0, 
    0.0)

point
Out[3]:

Lines - (LineString)¶

In [4]:
from shapely.geometry import LineString

line = LineString([point, (2, 2)])

line
Out[4]:
In [5]:
a1 = point
a2 = Point(1, 1)
a3 = Point(2, 2)

a = LineString([a1, a2, (1, 2), a3])
b = LineString([a1, a2, (2, 1), a3])
In [6]:
a
Out[6]:
In [7]:
b
Out[7]:

Intersection of Two LineString Objects¶

In [8]:
x = a.intersection(b)
x
Out[8]:

source

Polygons¶

In [9]:
from shapely.geometry import Polygon

c = Polygon([[1, 1], [-1, 3], [3, 3], [3, 1]])
d = Polygon([[0, 0], [0, 4], [4, 4], [4, 1]])
In [10]:
c
Out[10]:
In [11]:
d
Out[11]:

Intersection of Two Polygons¶

In [12]:
x = c.intersection(d)
x
Out[12]:

MultiPoint¶

In [13]:
from shapely.geometry import MultiPoint

e = MultiPoint([(0, 0), (1, 1), (1, -1)])
e
Out[13]:
In [14]:
e.convex_hull  # minimum bounding geometry
Out[14]:

And more¶

  • Object attributes and methods: .area, .length, .geom_type, .distance() , .representative_point(), ...
  • Unary predicates: .has_z, .is_ccw, .is_simple, .is_valid
  • Binary Predicates: .contains(), .covers(), crosses(),.within(),
  • Set-theoretic methods: .intersection(), .difference(), .union()
  • Constructive methods: .buffer(), .convex_hull, .envelope, .simplify()
  • Other: affine transformation, efficient merge and union, Delaunay triangualtion, snapping, validation, ...

🔵 Pandas¶

pandas-img

Introduction to Pandas¶

pandas is an open source, BSD-licensed library providing high-performance, easy-to-use data structures and data analysis tools for the Python programming language. Source: https://pandas.pydata.org/pandas-docs/stable/index.html

If totally new to Pandas, you can think of Pandas data structure as the table (.dbf) of a Shapefile or as a Sheet in Excel or Google Spreadsheet.

Pandas DataFrame¶

DataFrame is a 2-dimensional labeled data structure with columns of potentially different types. You can think of it like a spreadsheet or SQL table, or a dict of Series objects. It is generally the most commonly used pandas object. Like Series, DataFrame accepts many different kinds of input. Source: https://pandas.pydata.org/pandas-docs/stable/getting_started/dsintro.html

pandas_dataframe Source: https://pandas.pydata.org/pandas-docs/stable/getting_started

Reading in a .csv into Pandas¶

In [15]:
# CSV of NYC subway entrances https://data.cityofnewyork.us/dataset/DOITT_SUBWAY_ENTRANCE_01_13SEPT2010/he7q-3hwy

csv_url = 'https://data.cityofnewyork.us/api/views/he7q-3hwy/rows.csv?accessType=DOWNLOAD'
In [16]:
df = pd.read_csv(csv_url)
In [17]:
# show a random sample of rows from the .csv data read in to the DataFrame object
df.sample(5)
Out[17]:
OBJECTID URL NAME the_geom LINE
1331 1187 http://web.mta.info/nyct/service/ Greenwich St & Morris St at NW corner POINT (-74.01431899947436 40.70649300060878) 1
1068 924 http://web.mta.info/nyct/service/ Beach 106th St & Rockaway Frwy at SW corner POINT (-73.82716299942538 40.58324800067608) H
887 743 http://web.mta.info/nyct/service/ 8th Ave & 58th St at NE corner POINT (-73.98213700028958 40.76736700057975) A-B-C-D-1
110 1844 http://web.mta.info/nyct/service/ 7th Ave & 55th St at SE corner POINT (-73.98080300054961 40.76405700096569) N-Q-R
784 640 http://web.mta.info/nyct/service/ Whitlock Ave & Westchester Ave at SE corner POINT (-73.88608200010141 40.82729500089228) 6
In [18]:
# show the first handful of records using .head()

df.head(5)
Out[18]:
OBJECTID URL NAME the_geom LINE
0 1734 http://web.mta.info/nyct/service/ Birchall Ave & Sagamore St at NW corner POINT (-73.86835600032798 40.84916900104506) 2-5
1 1735 http://web.mta.info/nyct/service/ Birchall Ave & Sagamore St at NE corner POINT (-73.86821300022677 40.84912800131844) 2-5
2 1736 http://web.mta.info/nyct/service/ Morris Park Ave & 180th St at NW corner POINT (-73.87349900050798 40.84122300105249) 2-5
3 1737 http://web.mta.info/nyct/service/ Morris Park Ave & 180th St at NW corner POINT (-73.8728919997833 40.84145300067447) 2-5
4 1738 http://web.mta.info/nyct/service/ Boston Rd & 178th St at SW corner POINT (-73.87962300013866 40.84081500075867) 2-5
In [19]:
df.shape
Out[19]:
(1928, 5)
In [20]:
df.value_counts('LINE')
Out[20]:
LINE
F          125
1          120
6          113
2-5         92
A           75
          ... 
A-H          1
A-FS         1
A-C-J-L      1
A-C-FS       1
e-F-M-R      1
Name: count, Length: 97, dtype: int64

🔵 GeoPandas¶

geopandas-img

GeoPandas¶

In [21]:
from shapely import wkt

gdf_subway = gpd.GeoDataFrame(
    df, geometry=df['the_geom'].apply(wkt.loads), 
    crs="EPSG:4326")

gdf_subway.plot();
In [22]:
# Get details of the Coordinate Reference System
gdf_subway.crs
Out[22]:
<Geographic 2D CRS: EPSG:4326>
Name: WGS 84
Axis Info [ellipsoidal]:
- Lat[north]: Geodetic latitude (degree)
- Lon[east]: Geodetic longitude (degree)
Area of Use:
- name: World.
- bounds: (-180.0, -90.0, 180.0, 90.0)
Datum: World Geodetic System 1984 ensemble
- Ellipsoid: WGS 84
- Prime Meridian: Greenwich
In [23]:
IFrame(src='https://epsg.io/4326', width=1000, height=600)
Out[23]:
In [24]:
import matplotlib.patheffects as pe

def plot_labels(df, func, geom_field='geometry', color="black", halo_size=0, halo_color="white"):
    df['coords'] = [ coords[0] for coords in df[geom_field].apply(lambda x: x.representative_point().coords[:])]
    for idx, row in df.iterrows(): plt.annotate(func(row), xy=row['coords'], color=color, horizontalalignment='center',
    path_effects=[pe.withStroke(linewidth=halo_size, foreground=halo_color)]) # Function to render labels on a plot

boro_geojson_link = 'https://data.cityofnewyork.us/api/geospatial/tqmj-j8zm?method=export&format=GeoJSON'
In [25]:
gdf = gpd.read_file(boro_geojson_link) # read in geojson as geopandas GeoDataFrame
In [26]:
gdf.plot(figsize=(7, 7),facecolor="teal", edgecolor="black").set_axis_off();  # nyc boros
plot_labels(gdf, lambda row: row['boro_name'],halo_size=3) # render labels with the boro_name field
plt.title("NYC Boroughs"); # add a title
In [27]:
gdf.head(5)
Out[27]:
boro_code boro_name shape_area shape_leng geometry coords
0 5 Staten Island 1623620725.06 325917.353702 MULTIPOLYGON (((-74.05051 40.56642, -74.05047 ... (-74.1456424883759, 40.57249073815982)
1 2 Bronx 1187174784.85 463179.772813 MULTIPOLYGON (((-73.89681 40.79581, -73.89694 ... (-73.86633177711762, 40.85607244168192)
2 1 Manhattan 636520830.768 357564.316391 MULTIPOLYGON (((-74.01093 40.68449, -74.01193 ... (-73.96025111088261, 40.78870866639666)
3 3 Brooklyn 1934143372.64 728197.541089 MULTIPOLYGON (((-73.86327 40.58388, -73.86381 ... (-73.94864366802588, 40.65433337705038)
4 4 Queens 3041418506.76 888199.731579 MULTIPOLYGON (((-73.82645 40.59053, -73.82642 ... (-73.81987112279087, 40.704093029191846)
In [28]:
# what is the coordinate reference system?

gdf.crs
Out[28]:
<Geographic 2D CRS: EPSG:4326>
Name: WGS 84
Axis Info [ellipsoidal]:
- Lat[north]: Geodetic latitude (degree)
- Lon[east]: Geodetic longitude (degree)
Area of Use:
- name: World.
- bounds: (-180.0, -90.0, 180.0, 90.0)
Datum: World Geodetic System 1984 ensemble
- Ellipsoid: WGS 84
- Prime Meridian: Greenwich

To change the Projection, we set the argument inplace=True¶

In [29]:
gdf.to_crs(epsg=2263, inplace=True)  
# changed the state of the projection systems

# what is 2263 ???
In [30]:
gdf.crs
Out[30]:
<Projected CRS: EPSG:2263>
Name: NAD83 / New York Long Island (ftUS)
Axis Info [cartesian]:
- X[east]: Easting (US survey foot)
- Y[north]: Northing (US survey foot)
Area of Use:
- name: United States (USA) - New York - counties of Bronx; Kings; Nassau; New York; Queens; Richmond; Suffolk.
- bounds: (-74.26, 40.47, -71.8, 41.3)
Coordinate Operation:
- name: SPCS83 New York Long Island zone (US Survey feet)
- method: Lambert Conic Conformal (2SP)
Datum: North American Datum 1983
- Ellipsoid: GRS 1980
- Prime Meridian: Greenwich

Geoprocessing¶

Very broadly, Geoprocessing is any operation involving geospatial data or methods. The Geographic Information Systems (GIS) software company Esri refers to it as "Computing with geographic data."

It is commonly used interchangeably with the term Spatial Analysis.

  • However, Spatial Analysis includes the interpretation of the results of Geoprocessing.

Professor Jochen Albrecht (Hunter College, NYC) defines Geoprocessing as;

...any GIS operation used to manipulate data. A typical geoprocessing operation takes an input dataset, performs an operation on that dataset, and returns the result of the operation as an output dataset, also referred to as derived data.

Common geoprocessing operations are geographic feature overlay, feature selection and analysis, topology processing, and data conversion.

Geoprocessing allows you to define, manage, and analyze geographic information used to make decisions. - Jochen Albrecht

Geoprocessing Library Selection¶

There are several Geoprocessing libraries and technologies. A few notable ones are;

  • PostGIS - Spatially Enabled PostgreSQL
  • CARTO - Carto Spatial Analysis
  • GeoPandas - Pandas Extended with Shapely
  • ArcPy / ArcGIS - Esri's Python site package
  • PyQGIS - QGIS's Python Package
  • OGR/GDAL
  • Any several more (Alteryx, Tableau, etc.)

SQL¶

  • Simple Feature Access – Part 2: SQL Option
  • Oracle Spatial
  • PostGIS
  • Carto Spatial SQL
  • Spatialite
  • Amazon Redshift
  • Amazon Athena
  • Trino
  • etc.
SELECT a.id, 
       b.id,
       ST_Intersects(a.the_geom, b.the_geom) AS the_geom
  FROM a, b

ArcPy¶

ArcPy is a Python site package that provides a useful and productive way to perform geographic data analysis, data conversion, data management, and map automation with Python. Esri

Arcpy is for ArcGIS Desktop or ArcGIS Pro. The ArcGIS API for Python, using ArcGIS Online is also available for multiple platforms.

arcpy.analysis.Intersect([feature_1, feature_2], out_feature)

Geometric Manipulations¶

Buffer¶

Returns a GeoSeries of geometries representing all points within a given distance of each geometric object. Source: https://geopandas.org/geometric_manipulations.html

In [31]:
fig, ax = plt.subplots(figsize=(6, 6))
gdf.plot(ax=ax, facecolor="teal", alpha=0.3).set_axis_off()

gdf.buffer(5000).plot(ax=ax, facecolor="none", alpha=0.7, edgecolor='red', linewidth=3)

plt.title('Buffer');

Centroid¶

In mathematics and physics, the centroid or geometric center of a plane figure is the arithmetic mean position of all the points in the figure. Source: https://en.wikipedia.org/wiki/Centroid

In [32]:
fig, ax = plt.subplots(figsize=(6, 6))
gdf.plot(ax=ax, facecolor="teal", alpha=0.3).set_axis_off()

gdf.centroid.plot(ax=ax, facecolor="red", markersize=200)

plt.title('Centroid');

Convex Hull¶

In geometry, the convex hull or convex envelope or convex closure of a shape is the smallest convex set that contains it. For a bounded subset of the plane, the convex hull may be visualized as the shape enclosed by a rubber band stretched around the subset. Source: https://en.wikipedia.org/wiki/Convex_hull

In [33]:
fig, ax = plt.subplots(figsize=(6, 6))
gdf.plot(ax=ax, facecolor="teal", alpha=0.3).set_axis_off()

gdf.convex_hull.plot(ax=ax, facecolor="none", alpha=0.7, edgecolor='red', linewidth=3)

plt.title('Convex Hull');

Envelope¶

Returns a GeoSeries of geometries representing the point or smallest rectangular polygon (with sides parallel to the coordinate axes) that contains each object. Source

In [34]:
fig, ax = plt.subplots(figsize=(7, 7))
gdf.plot(ax=ax, facecolor="teal", alpha=0.3).set_axis_off()

gdf.envelope.plot(ax=ax, facecolor="none", alpha=0.7,edgecolor='red', linewidth=3)

plt.title('Envelope');

Oriented envelope¶

Returns the general minimum bounding rectangle that contains the object. Unlike envelope this rectangle is not constrained to be parallel to the coordinate axes. If the convex hull of the object is a degenerate (line or point) this degenerate is returned. Source

In [35]:
# Not in geopandas, applying shapely function
oriented_envelopes = gdf['geometry'].apply(shapely.oriented_envelope)

fig, ax = plt.subplots(figsize=(7, 7))
gdf.plot(ax=ax, facecolor="teal", alpha=0.3).set_axis_off()

oriented_envelopes.plot(ax=ax, facecolor="none", alpha=0.7,edgecolor='red', linewidth=3);
plt.title('Oriented Envelope');

Simplify¶

Returns a GeoSeries containing a simplified representation of each object. Source: https://geopandas.org/geometric_manipulations.html#GeoSeries.boundary

In [36]:
fig, ax = plt.subplots(figsize=(5, 5))
gdf.plot(ax=ax, facecolor="teal", alpha=0.3).set_axis_off()

gdf.simplify(3000).plot(ax=ax, facecolor="none", alpha=0.7, edgecolor='red',  linewidth=3)

plt.title('Simplify');

Spatial Join¶

A spatial join uses binary predicates such as intersects and crosses to combine two GeoDataFrames based on the spatial relationship between their geometries.

A common use case might be a spatial join between a point layer and a polygon layer where you want to retain the point geometries and grab the attributes of the intersecting polygons.

spatial-join

Can use Spatial Join for a Point-in-Polygon analysis.

In [37]:
nyc_census_tracts = gpd.read_file(
    'https://data.cityofnewyork.us/api/geospatial/fxpq-c8ku?method=export&format=GeoJSON')

nyc_census_tracts = nyc_census_tracts[['boro_ct2010', 'geometry']]
In [38]:
nyc_census_tracts.head(5)
Out[38]:
boro_ct2010 geometry
0 5000900 MULTIPOLYGON (((-74.07921 40.64343, -74.07914 ...
1 1009800 MULTIPOLYGON (((-73.96433 40.75638, -73.96479 ...
2 1010200 MULTIPOLYGON (((-73.97124 40.76094, -73.97170 ...
3 1010400 MULTIPOLYGON (((-73.97446 40.76229, -73.97491 ...
4 1011300 MULTIPOLYGON (((-73.98412 40.75484, -73.98460 ...
In [39]:
nyc_census_tracts.plot(figsize=(6, 6), 
    facecolor="teal", edgecolor="white", linewidth=1
).set_axis_off();
plt.title("NYC tracts");
In [40]:
hotels_2020 = gpd.read_file(  # https://medium.com/@nygeog/data-science-data-focus-scraped-nyc-hotel-room-counts-for-covid19-response-quarantine-housing-8cc08c4d084a
    'https://raw.githubusercontent.com/autoceqr/get_nyc_hotels/master/data/nyc_hotels.geojson')

hotels_2020 = hotels_2020[['hotel_id', 'geometry', 'num_rooms']]
hotels_2020.head(5)
Out[40]:
hotel_id geometry num_rooms
0 181547 POINT (-73.83182 40.75967) 173
1 55876 POINT (-73.98305 40.74278) 131
2 42879 POINT (-73.98553 40.74463) 337
3 5122464 POINT (-73.98950 40.73152) 286
4 183709 POINT (-73.98667 40.74669) 172
In [41]:
hotels_2020['num_rooms'].hist(bins=10, color='purple');
In [42]:
fig, ax = plt.subplots(figsize=(7,6))
nyc_census_tracts.boundary.plot(ax=ax, edgecolor="black", alpha=0.2, linewidth=0.2).set_axis_off()
hotels_2020.plot(ax=ax,color='red');
plt.title("Hotels");
In [43]:
sj_gdf = gpd.sjoin(  # <- SJOIN == SPATIAL JOIN
    hotels_2020, nyc_census_tracts, how="right") 

# going to preserve tracts, to keep tracts w/ out hotels in DataFrame
sj_gdf.head(5)
Out[43]:
index_left hotel_id num_rooms boro_ct2010 geometry
0 NaN NaN NaN 5000900 MULTIPOLYGON (((-74.07921 40.64343, -74.07914 ...
1 264.0 905473.0 206.0 1009800 MULTIPOLYGON (((-73.96433 40.75638, -73.96479 ...
2 118.0 56409.0 398.0 1010200 MULTIPOLYGON (((-73.97124 40.76094, -73.97170 ...
2 155.0 270490.0 238.0 1010200 MULTIPOLYGON (((-73.97124 40.76094, -73.97170 ...
2 192.0 357419.0 909.0 1010200 MULTIPOLYGON (((-73.97124 40.76094, -73.97170 ...
In [44]:
# count hotels in tracts
count_hotels_in_tracts = sj_gdf[['boro_ct2010', 'hotel_id']].groupby(
    ['boro_ct2010'], as_index=False).count()

count_hotels_in_tracts.rename(columns={'hotel_id': 'num_hotels'}, inplace=True)
count_hotels_in_tracts.sort_values('num_hotels', ascending=False)
Out[44]:
boro_ct2010 num_hotels
116 1011500 14
137 1013700 12
109 1011100 11
82 1008400 11
120 1011900 9
... ... ...
773 3019100 0
772 3019000 0
771 3018800 0
770 3018700 0
2164 5032300 0

2165 rows × 2 columns

In [45]:
# count total hotel rooms in tracts
count_rooms_in_tracts = sj_gdf[['boro_ct2010','num_rooms']].groupby(
    ['boro_ct2010'], as_index=False).sum()
count_rooms_in_tracts.sort_values('num_rooms', ascending=False)
Out[45]:
boro_ct2010 num_rooms
120 1011900 5766.0
137 1013700 5230.0
116 1011500 4427.0
131 1013100 4368.0
91 1009200 4036.0
... ... ...
773 3019100 0.0
772 3019000 0.0
771 3018800 0.0
770 3018700 0.0
2164 5032300 0.0

2165 rows × 2 columns

In [46]:
# merge (join) our tables

nyc_census_tracts_with_hotel_data = nyc_census_tracts.merge(
    count_hotels_in_tracts, on='boro_ct2010', how='left').merge(
        count_rooms_in_tracts, on='boro_ct2010', how='left')
In [47]:
nyc_census_tracts_with_hotel_data.sort_values(
    ['num_hotels', 'num_rooms'], ascending=[False, False]) 
Out[47]:
boro_ct2010 geometry num_hotels num_rooms
1997 1011500 MULTIPOLYGON (((-73.98979 40.75723, -73.99028 ... 14 4427.0
1776 1013700 MULTIPOLYGON (((-73.97623 40.76567, -73.97670 ... 12 5230.0
765 1011100 MULTIPOLYGON (((-73.99163 40.75471, -73.99208 ... 11 3256.0
675 1008400 MULTIPOLYGON (((-73.98089 40.75348, -73.98094 ... 11 2080.0
416 1011900 MULTIPOLYGON (((-73.98226 40.75739, -73.98271 ... 9 5766.0
... ... ... ... ...
2159 2005100 MULTIPOLYGON (((-73.92518 40.81801, -73.92622 ... 0 0.0
2160 2006300 MULTIPOLYGON (((-73.92222 40.83416, -73.92349 ... 0 0.0
2161 5007500 MULTIPOLYGON (((-74.08588 40.63669, -74.08569 ... 0 0.0
2162 5007700 MULTIPOLYGON (((-74.08709 40.64033, -74.08721 ... 0 0.0
2164 4017100 MULTIPOLYGON (((-73.91354 40.75347, -73.91257 ... 0 0.0

2165 rows × 4 columns

In [48]:
# number of hotels in census tracts
nctwhd = nyc_census_tracts_with_hotel_data[nyc_census_tracts_with_hotel_data["num_hotels"]>0]

fig, ax = plt.subplots(figsize=(8,6))
nyc_census_tracts.boundary.plot(ax=ax,edgecolor="black",linewidth=0.2)
nctwhd.plot(ax=ax,column='num_hotels', colormap="RdPu", legend=True).set_axis_off();
plt.title("Hotels per tract");
In [49]:
# number of hotel rooms in census tracts
nctwhd = nyc_census_tracts_with_hotel_data[nyc_census_tracts_with_hotel_data["num_rooms"]>0]

fig, ax = plt.subplots(figsize=(8,6))
nyc_census_tracts.boundary.plot(ax=ax,edgecolor="black",linewidth=0.2)
nctwhd.plot(ax=ax,column='num_rooms', colormap="YlGnBu", legend=True).set_axis_off();
plt.title("Rooms per tract");

Set-Operations with Overlay¶

When working with multiple spatial datasets – especially multiple polygon or line datasets – users often wish to create new shapes based on places where those datasets overlap (or don’t overlap). These manipulations are often referred using the language of sets – intersections, unions, and differences. These types of operations are made available in the geopandas library through the overlay function.

overlay Source: https://geopandas.org/set_operations.html

The basic idea is demonstrated by the graphic below but keep in mind that overlays operate at the DataFrame level, not on individual geometries, and the properties from both are retained. In effect, for every shape in the first GeoDataFrame, this operation is executed against every other shape in the other GeoDataFrame. Source: https://geopandas.org/set_operations.html

These Relationships follow the DE-9IM model¶

The Dimensionally Extended nine-Intersection Model (DE-9IM) is a topological model and a standard used to describe the spatial relations of two regions (two geometries in two-dimensions, R2), in geometry, point-set topology, geospatial topology, and fields related to computer spatial analysis. The spatial relations expressed by the model are invariant to rotation, translation and scaling transformations. Source: https://en.wikipedia.org/wiki/DE-9IM

In [50]:
IFrame(src='https://en.wikipedia.org/wiki/DE-9IM', width=1000, height=600)
Out[50]:
In [51]:
import geopandas
from shapely.geometry import Polygon
        
# WE HAVE THESE 2 SETS OF POLYGON LAYERS EACH WITH 2 FEATURES
polys1 = geopandas.GeoSeries([
    Polygon([(0,0), (2,0), (2,2), (0,2)]),
    Polygon([(2,2), (4,2), (4,4), (2,4)])])

polys2 = geopandas.GeoSeries([
    Polygon([(1,1), (3,1), (3,3), (1,3)]),
    Polygon([(3,3), (5,3), (5,5), (3,5)])])

df1 = geopandas.GeoDataFrame({'geometry': polys1, 'df1':[1,2]})
df2 = geopandas.GeoDataFrame({'geometry': polys2, 'df2':[1,2]})

ax = df1.plot(color='red', figsize=(5, 5));
df2.plot(ax=ax, color='green', alpha=0.5);

Union¶

The output feature class will contain polygons representing the geometric union of all the inputs as well as all the fields from all the input feature classes. Source: https://desktop.arcgis.com/en/arcmap/10.3/tools/analysis-toolbox/how-union-analysis-works.htm

esri_union

In [52]:
res_union = geopandas.overlay(df1, df2, how='union')
res_union  # FOR UNION - ALL AREAS ARE OUTPUTED, SLICED BY OVERLAP BORDERS
Out[52]:
df1 df2 geometry
0 1.0 1.0 POLYGON ((2.00000 2.00000, 2.00000 1.00000, 1....
1 2.0 1.0 POLYGON ((2.00000 2.00000, 2.00000 3.00000, 3....
2 2.0 2.0 POLYGON ((4.00000 4.00000, 4.00000 3.00000, 3....
3 1.0 NaN POLYGON ((2.00000 0.00000, 0.00000 0.00000, 0....
4 2.0 NaN MULTIPOLYGON (((3.00000 3.00000, 4.00000 3.000...
5 NaN 1.0 MULTIPOLYGON (((2.00000 2.00000, 3.00000 2.000...
6 NaN 2.0 POLYGON ((3.00000 5.00000, 5.00000 5.00000, 5....
In [53]:
ax = res_union.plot(alpha=0.5, cmap='tab10', figsize=(5, 5))
df1.plot(ax=ax, facecolor='none', edgecolor='k');
df2.plot(ax=ax, facecolor='none', edgecolor='k');

plot_labels(res_union, lambda row: f"{row['df1']}|{row['df2']}",color="black",halo_color="white",halo_size=2)

Intersect¶

The Intersect tool calculates the geometric intersection of any number of feature classes and feature layers. The features, or portion of features, that are common (or shared) to all inputs (that is, they intersect) will be written to the output feature class. Source: https://desktop.arcgis.com/en/arcmap/10.3/tools/analysis-toolbox/how-intersect-analysis-works.htm

Diagrams below from: https://desktop.arcgis.com/en/arcmap/10.3/tools/analysis-toolbox/how-intersect-analysis-works.htm

Polygon inputs and polygon output¶

esri_intersect_poly_poly

Line inputs¶

esri_intersect_line_line

Line inputs and point output¶

esri_intersect_line_point

Point inputs¶

esri_intersect_point_point

Diagrams above from: https://desktop.arcgis.com/en/arcmap/10.3/tools/analysis-toolbox/how-intersect-analysis-works.htm

In [54]:
res_intersection = geopandas.overlay(df1, df2, how='intersection')
res_intersection 
Out[54]:
df1 df2 geometry
0 1 1 POLYGON ((2.00000 2.00000, 2.00000 1.00000, 1....
1 2 1 POLYGON ((2.00000 2.00000, 2.00000 3.00000, 3....
2 2 2 POLYGON ((4.00000 4.00000, 4.00000 3.00000, 3....
In [55]:
ax = res_intersection.plot(cmap='tab10', figsize=(4, 4))

df1.plot(ax=ax, facecolor='none', edgecolor='k');
df2.plot(ax=ax, facecolor='none', edgecolor='k');

plot_labels(res_intersection, lambda row: f"{row['df1']}|{row['df2']}",color="white")

Difference¶

Seems most similar to Esri's Erase overlay function.

Creates a feature class by overlaying the Input Features with the polygons of the Erase Features. Only those portions of the input features falling outside the erase features outside boundaries are copied to the output feature class. Source: https://desktop.arcgis.com/en/arcmap/10.3/tools/analysis-toolbox/erase.htm

difference_erase

In [56]:
res_difference = geopandas.overlay(df1, df2, how='difference')
res_difference  # ANY AREAS THAT DON'T OVERLAP - USE THE SECOND FEATURE TO ERASE FROM 1ST FEATURE
Out[56]:
geometry df1
0 POLYGON ((2.00000 0.00000, 0.00000 0.00000, 0.... 1
1 MULTIPOLYGON (((3.00000 3.00000, 4.00000 3.000... 2
In [57]:
ax = res_difference.plot(cmap='tab10', figsize=(6, 6))

df1.plot(ax=ax, facecolor='none', edgecolor='k')
df2.plot(ax=ax, facecolor='none', edgecolor='k');

plot_labels(res_difference, lambda row: f"{row['df1']}", color="white")

Symetrical Difference¶

Features or portions of features in the input and update features that do not overlap will be written to the output feature class. Source: https://desktop.arcgis.com/en/arcmap/10.3/tools/analysis-toolbox/symmetrical-difference.htm

Inverse or opposite of the Intersect

symetrical_difference

In [58]:
res_symdiff = geopandas.overlay(df1, df2, how='symmetric_difference')
res_symdiff
Out[58]:
df1 df2 geometry
0 1.0 NaN POLYGON ((2.00000 0.00000, 0.00000 0.00000, 0....
1 2.0 NaN MULTIPOLYGON (((3.00000 3.00000, 4.00000 3.000...
2 NaN 1.0 MULTIPOLYGON (((2.00000 2.00000, 3.00000 2.000...
3 NaN 2.0 POLYGON ((3.00000 5.00000, 5.00000 5.00000, 5....
In [59]:
ax = res_symdiff.plot(cmap='tab10', figsize=(6, 6))

df1.plot(ax=ax, facecolor='none', edgecolor='k');
df2.plot(ax=ax, facecolor='none', edgecolor='k');

plot_labels(res_symdiff, lambda row: f"{row['df1']}|{row['df2']}", color="white")

Dissolve¶

https://geopandas.org/aggregation_with_dissolve.html

dissolve

Aggregates features based on specified attributes. Source: https://desktop.arcgis.com/en/arcmap/latest/tools/data-management-toolbox/dissolve.htm

In [60]:
countries = geopandas.read_file(
    geopandas.datasets.get_path('naturalearth_lowres'))
In [61]:
countries.plot(cmap='tab10', figsize=(10, 10));

Dissolve¶

In [62]:
countries = countries[['continent', 'geometry']]
In [63]:
continents = countries.dissolve(by='continent')
In [64]:
continents.plot(cmap='tab10', figsize=(10, 10)).set_axis_off();
plot_labels(continents, lambda row: f"{row.name}",halo_size=3);
plt.title("Continents");

Dissolve and Sum¶

In [65]:
countries = geopandas.read_file(
    geopandas.datasets.get_path('naturalearth_lowres'))
countries.head(5)
Out[65]:
pop_est continent name iso_a3 gdp_md_est geometry
0 889953.0 Oceania Fiji FJI 5496 MULTIPOLYGON (((180.00000 -16.06713, 180.00000...
1 58005463.0 Africa Tanzania TZA 63177 POLYGON ((33.90371 -0.95000, 34.07262 -1.05982...
2 603253.0 Africa W. Sahara ESH 907 POLYGON ((-8.66559 27.65643, -8.66512 27.58948...
3 37589262.0 North America Canada CAN 1736425 MULTIPOLYGON (((-122.84000 49.00000, -122.9742...
4 328239523.0 North America United States of America USA 21433226 MULTIPOLYGON (((-122.84000 49.00000, -120.0000...
In [66]:
continents = countries[['geometry','continent','pop_est']].dissolve(
    by='continent', aggfunc='sum')
continents
Out[66]:
geometry pop_est
continent
Africa MULTIPOLYGON (((-11.43878 6.78592, -11.70819 6... 1.306370e+09
Antarctica MULTIPOLYGON (((-61.13898 -79.98137, -60.61012... 4.490000e+03
Asia MULTIPOLYGON (((48.67923 14.00320, 48.23895 13... 4.550277e+09
Europe MULTIPOLYGON (((-53.55484 2.33490, -53.77852 2... 7.454125e+08
North America MULTIPOLYGON (((-155.22217 19.23972, -155.5421... 5.837560e+08
Oceania MULTIPOLYGON (((147.91405 -43.21152, 147.56456... 4.120487e+07
Seven seas (open ocean) POLYGON ((68.93500 -48.62500, 69.58000 -48.940... 1.400000e+02
South America MULTIPOLYGON (((-68.63999 -55.58002, -69.23210... 4.270667e+08
In [67]:
continents.plot(column = 'pop_est', scheme='quantiles', 
    cmap='YlOrRd', figsize=(10, 10)).set_axis_off();
plot_labels(continents, lambda row: "{0:,d}".format(int(row['pop_est'])),halo_size=2)
plt.title("sum(pop_est)");

Merging Data¶

Spatial Joins¶

https://geopandas.org/mergingdata.html

In [68]:
world = geopandas.read_file(
    geopandas.datasets.get_path('naturalearth_lowres'))
cities = geopandas.read_file(
    geopandas.datasets.get_path('naturalearth_cities'))

countries = world[['geometry', 'name', 'pop_est', 'gdp_md_est']]
countries = countries.rename(columns={'name':'country'})
In [69]:
fig, ax = plt.subplots(figsize=[9,6]);
countries.plot(ax=ax,color='lightblue',edgecolor="teal");
cities.plot(ax=ax,color='purple').set_axis_off();
plt.title("Countries and cities");
In [70]:
countries.head(5)
Out[70]:
geometry country pop_est gdp_md_est
0 MULTIPOLYGON (((180.00000 -16.06713, 180.00000... Fiji 889953.0 5496
1 POLYGON ((33.90371 -0.95000, 34.07262 -1.05982... Tanzania 58005463.0 63177
2 POLYGON ((-8.66559 27.65643, -8.66512 27.58948... W. Sahara 603253.0 907
3 MULTIPOLYGON (((-122.84000 49.00000, -122.9742... Canada 37589262.0 1736425
4 MULTIPOLYGON (((-122.84000 49.00000, -120.0000... United States of America 328239523.0 21433226
In [71]:
cities.head(5)
Out[71]:
name geometry
0 Vatican City POINT (12.45339 41.90328)
1 San Marino POINT (12.44177 43.93610)
2 Vaduz POINT (9.51667 47.13372)
3 Lobamba POINT (31.20000 -26.46667)
4 Luxembourg POINT (6.13000 49.61166)

Spatial Join to capture the Countries Data, specifically GDP, we want to "JOIN" Country GDP to the Cities¶

In [72]:
cities_with_country = geopandas.sjoin(
    cities, countries, 
    how="inner", op='intersects')

Let's look at the Spatially Joined Output¶

In [73]:
cities_with_country.head(5)
Out[73]:
name geometry index_right country pop_est gdp_md_est
0 Vatican City POINT (12.45339 41.90328) 141 Italy 60297396.0 2003576
1 San Marino POINT (12.44177 43.93610) 141 Italy 60297396.0 2003576
226 Rome POINT (12.48131 41.89790) 141 Italy 60297396.0 2003576
2 Vaduz POINT (9.51667 47.13372) 114 Austria 8877067.0 445075
212 Vienna POINT (16.36469 48.20196) 114 Austria 8877067.0 445075

Let's look at the output Geometry¶

In [74]:
cities_with_country.plot(color='red',figsize=[10,7]);

Plotting our GDP values assigned to Cities as a Graduated Symbol¶

In [75]:
fig, ax = plt.subplots(figsize=(12, 4))
countries.plot(ax=ax, facecolor='gray')

cities_with_country.plot(
    ax=ax, column='gdp_md_est', scheme='quantiles', legend=True,
    cmap='Greens', edgecolor='white', alpha=0.6,
    markersize=cities_with_country['gdp_md_est']/5000.,
    legend_kwds={"title": 'GDP', "loc": 4}).set_axis_off()

plt.title('Gross Domestic Product (GDP) - 2016', {'fontsize': 27});
plt.savefig('img/gdp.png')

🔵 Raster Data Model¶

Raster¶

[Rasters] are frequently represented as a continuous grid of square cells, each containing a value indicating the (estimated) average height or strength of the field in that cell. In most of the literature and within software packages the points/ lines/ areas model is described as vector data, whilst the grid model is described as raster (or image) data. Source: https://www.spatialanalysisonline.com/HTML/index.html

Raster Basics¶

A raster is a rectangular grid of pixels with values that can continuous (e.g. elevation) or categorical (e.g. land use). This data structure is very common - jpg images on the web, photos from your digital camera, and the jupyterhub icon are all rasters. Source: https://github.com/geohackweek/raster-2019/blob/master/notebooks/0-introduction.ipynb

Common properties of any raster:

number of rows and columns (sometimes referred to as lines and samples) data type (dtype, or bit depth) - e.g., 8-bit (2^8 possible values, 0-255) some kind of resolution information, often dots per inch (dpi) with raster graphics A geospatial raster is only different from a digital photo in that it is accompanied by metadata that connects the grid to a particular location. Source: https://github.com/geohackweek/raster-2019/blob/master/notebooks/0-introduction.ipynb

raster_data_model

Source: National Ecological Observatory Network (NEON) via: https://github.com/geohackweek/raster-2019/blob/master/notebooks/0-introduction.ipynb

Examples of categorical rasters¶

Some rasters contain categorical data where each pixel represents a discrete class such as a landcover type (e.g., "coniferous forest" or "grassland") rather than a continuous value such as elevation or temperature. Some examples of classified maps include:

  • Landcover / land-use maps.
  • Snowcover masks (binary snow or no snow)

The following map shows elevation data for the NEON Harvard Forest field site. In this map, the elevation data (a continuous variable) has been divided up into categories to yield a categorical raster. Source: https://github.com/geohackweek/raster-2019/blob/master/notebooks/0-introduction.ipynb

Land Use Raster (Discrete)¶

category_raster

Source: Homer, C.G., et al., 2015, Completion of the 2011 National Land Cover Database for the conterminous United States-Representing a decade of land cover change information. Photogrammetric Engineering and Remote Sensing, v. 81, no. 5, p. 345-354) via https://datacarpentry.org/organization-geospatial/01-intro-raster-data/index.html

Digital Elevation Raster (Continuous)¶

dem Source: https://gisgeography.com/free-global-dem-data-sources/

Raster and Resolution¶

resolution of a raster represents the area on the ground that each pixel of the raster covers. The image below illustrates the effect of changes in resolution. Source: https://datacarpentry.org/organization-geospatial/01-intro-raster-data/index.html

raster_resolution

Source: National Ecological Observatory Network (NEON) via https://datacarpentry.org/organization-geospatial/01-intro-raster-data/index.html

Multispectral (multiband) Raster¶

A raster can contain one or more bands. One type of multi-band raster dataset that is familiar to many of us is a color image. A basic color image consists of three bands: red, green, and blue. Each band represents light reflected from the red, green or blue portions of the electromagnetic spectrum. The pixel brightness for each band, when composited creates the colors that we see in an image. Source: https://datacarpentry.org/organization-geospatial/01-intro-raster-data/index.html

multispectral_raster

Source: National Ecological Observatory Network (NEON) via https://datacarpentry.org/organization-geospatial/01-intro-raster-data/index.html

Let's compare a two maps, one derived by Vector Data and one derived from Raster Data¶

In [76]:
import ipyleaflet  # source: https://github.com/geohackweek/raster-2019/blob/master/notebooks/3-visualization-and-modis.ipynb
from ipyleaflet import Map, Rectangle, basemaps, basemap_to_tiles, TileLayer, SplitMapControl, Polygon
import ipywidgets
import datetime
import re
bbox = [43.16, -11.32, 43.54, -11.96]
west, north, east, south = bbox
bbox_ctr = [0.5*(north+south), 0.5*(west+east)]

m = Map(center=bbox_ctr, zoom=5)  # MODIS great for large areas (onl)
right_layer = basemap_to_tiles(basemaps.NASAGIBS.ModisTerraTrueColorCR, "2020-04-01")
left_layer = TileLayer()
control = SplitMapControl(left_layer=left_layer, right_layer=right_layer)
m.add_control(control)
m
Out[76]:
Map(center=[-11.64, 43.349999999999994], controls=(ZoomControl(options=['position', 'zoom_in_text', 'zoom_in_t…

Let's take a look at some very simple Raster Data model¶

In [77]:
import numpy as np
import pandas as pd 

pd.set_option('display.max_rows', 500)
pd.set_option('display.max_columns', 500)
pd.set_option('display.width', 1000)
# let's allow our dataframe printouts in jupyter to be larger
In [78]:
simple_image = np.load('data/image_file.npy')   
# .npy is a numpy file storage format

What is this simple_image?¶

In [79]:
simple_image
Out[79]:
array([[  0,   0,   0,   0,   0,   0,   0,   0,   0,   0,   0,   0,   0,
          0,   0,   0,   0,   0,   0,   0,   0,   0,   0,   0,   0,   0,
          0,   0],
       [  0,   0,   0,   0,   0,   0,   0,   0,   0,   0,   0,   0,   0,
          0,   0,   0,   0,   0,   0,   0,   0,   0,   0,   0,   0,   0,
          0,   0],
       [  0,   0,   0,   0,   0,   0,   0,   0,   0,   0,   0,   0,   0,
          0,   0,   0,   0,   0,   0,   0,   0,   0,   0,   0,   0,   0,
          0,   0],
       [  0,   0,   0,   0,   0,   0,   0,   0,   0,   0,   0,   0,   0,
          0,   0,   0,   0,   0,   0,   0,   0,   0,   0,   0,   0,   0,
          0,   0],
       [  0,   0,   0,   0,   0,   0,   0,   0,   0,   0,   0,   0,   0,
          0,   0,   0,   0,  11, 104, 159, 159, 232, 195, 102,   0,   0,
          0,   0],
       [  0,   0,   0,   0,   0,   0,   0,   0,   0,   0,   0,   0,   0,
          0,   0,  19, 154, 227, 254, 235, 174, 167, 233, 184,   0,   0,
          0,   0],
       [  0,   0,   0,   0,   0,   0,   0,   0,   0,   0,   0,   0,   0,
          0, 106, 217, 254, 239, 159,  23,   0,   0,  68, 221,   0,   0,
          0,   0],
       [  0,   0,   0,   0,   0,   0,   0,   0,   0,   0,   0,   0,   0,
         72, 249, 251, 131,  11,   0,   0,  53,  13,   2,  40,   0,   0,
          0,   0],
       [  0,   0,   0,   0,   0,   0,   0,   0,   0,   0,   0,   0,  26,
        243, 250,  72,   0,   0,   5, 184, 251, 103,   0,   0,   0,   0,
          0,   0],
       [  0,   0,   0,   0,   0,   0,   0,   0,   0,   0,   0,   0, 152,
        254, 170,   0,   0,   0, 148, 254, 254,  85,   0,   0,   0,   0,
          0,   0],
       [  0,   0,   0,   0,   0,   0,   0,   0,   0,   0,   0,  21, 225,
        254,  41,   0,  30, 198, 253, 254,  88,   2,   0,   0,   0,   0,
          0,   0],
       [  0,   0,   0,   0,   0,   0,   0,   0,   0,   0,   0,  25, 231,
        254,  67,  22, 181, 251, 246, 108,  20,   0,   0,   0,   0,   0,
          0,   0],
       [  0,   0,   0,   0,   0,   0,   0,   0,   0,   0,   0,  37, 241,
        254, 146, 240, 254, 238,  62,   0,   0,   0,   0,   0,   0,   0,
          0,   0],
       [  0,   0,   0,   0,   0,   0,   0,   0,   0,   0,   0,  41, 226,
        254, 254, 254, 192,  21,   3,   0,   0,   0,   0,   0,   0,   0,
          0,   0],
       [  0,   0,   0,   0,   0,   0,   0,   0,   0,   0,   0,  91, 229,
        254, 246, 120,  16,   0,   0,   0,   0,   0,   0,   0,   0,   0,
          0,   0],
       [  0,   0,   0,   0,   0,   0,   0,   0,   0,   0,  56, 248, 254,
        254, 190,  15,   0,   0,   0,   0,   0,   0,   0,   0,   0,   0,
          0,   0],
       [  0,   0,   0,   0,   0,   0,   0,   0,   0,  92, 249, 229, 131,
        213, 254, 136,   0,   0,   0,   0,   0,   0,   0,   0,   0,   0,
          0,   0],
       [  0,   0,   0,   0,   0,   0,   0,   0,  69, 233, 248,  65,   0,
        178, 254, 143,   0,   0,   0,   0,   0,   0,   0,   0,   0,   0,
          0,   0],
       [  0,   0,   0,   0,   0,   0,   9,  73, 252, 248,  88,   0,   0,
         84, 249, 180,   0,   0,   0,   0,   0,   0,   0,   0,   0,   0,
          0,   0],
       [  0,   0,   0,   0,   0,   0,  44, 254, 246,  50,   0,   0,  25,
        210, 247,  86,   0,   0,   0,   0,   0,   0,   0,   0,   0,   0,
          0,   0],
       [  0,   0,   0,   0,   0,  25, 213, 254,  84,   0,   2,  82, 223,
        254, 203,   0,   0,   0,   0,   0,   0,   0,   0,   0,   0,   0,
          0,   0],
       [  0,   0,   0,   0,   0,  63, 254, 254,   4,  33, 168, 254, 254,
        176,   5,   0,   0,   0,   0,   0,   0,   0,   0,   0,   0,   0,
          0,   0],
       [  0,   0,   0,   0,   0,  31, 225, 254, 227, 240, 254, 235, 133,
          9,   0,   0,   0,   0,   0,   0,   0,   0,   0,   0,   0,   0,
          0,   0],
       [  0,   0,   0,   0,   0,   0,  42, 193, 254, 254, 222,  29,   0,
          0,   0,   0,   0,   0,   0,   0,   0,   0,   0,   0,   0,   0,
          0,   0],
       [  0,   0,   0,   0,   0,   0,   0,   0,   0,   0,   0,   0,   0,
          0,   0,   0,   0,   0,   0,   0,   0,   0,   0,   0,   0,   0,
          0,   0],
       [  0,   0,   0,   0,   0,   0,   0,   0,   0,   0,   0,   0,   0,
          0,   0,   0,   0,   0,   0,   0,   0,   0,   0,   0,   0,   0,
          0,   0],
       [  0,   0,   0,   0,   0,   0,   0,   0,   0,   0,   0,   0,   0,
          0,   0,   0,   0,   0,   0,   0,   0,   0,   0,   0,   0,   0,
          0,   0],
       [  0,   0,   0,   0,   0,   0,   0,   0,   0,   0,   0,   0,   0,
          0,   0,   0,   0,   0,   0,   0,   0,   0,   0,   0,   0,   0,
          0,   0]], dtype=uint8)

Plotting this image data in Pandas aas an .ndarray like a DataFrame?¶

In [80]:
pd.DataFrame(simple_image).head(28)
Out[80]:
0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27
0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
2 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
3 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
4 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 11 104 159 159 232 195 102 0 0 0 0
5 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 19 154 227 254 235 174 167 233 184 0 0 0 0
6 0 0 0 0 0 0 0 0 0 0 0 0 0 0 106 217 254 239 159 23 0 0 68 221 0 0 0 0
7 0 0 0 0 0 0 0 0 0 0 0 0 0 72 249 251 131 11 0 0 53 13 2 40 0 0 0 0
8 0 0 0 0 0 0 0 0 0 0 0 0 26 243 250 72 0 0 5 184 251 103 0 0 0 0 0 0
9 0 0 0 0 0 0 0 0 0 0 0 0 152 254 170 0 0 0 148 254 254 85 0 0 0 0 0 0
10 0 0 0 0 0 0 0 0 0 0 0 21 225 254 41 0 30 198 253 254 88 2 0 0 0 0 0 0
11 0 0 0 0 0 0 0 0 0 0 0 25 231 254 67 22 181 251 246 108 20 0 0 0 0 0 0 0
12 0 0 0 0 0 0 0 0 0 0 0 37 241 254 146 240 254 238 62 0 0 0 0 0 0 0 0 0
13 0 0 0 0 0 0 0 0 0 0 0 41 226 254 254 254 192 21 3 0 0 0 0 0 0 0 0 0
14 0 0 0 0 0 0 0 0 0 0 0 91 229 254 246 120 16 0 0 0 0 0 0 0 0 0 0 0
15 0 0 0 0 0 0 0 0 0 0 56 248 254 254 190 15 0 0 0 0 0 0 0 0 0 0 0 0
16 0 0 0 0 0 0 0 0 0 92 249 229 131 213 254 136 0 0 0 0 0 0 0 0 0 0 0 0
17 0 0 0 0 0 0 0 0 69 233 248 65 0 178 254 143 0 0 0 0 0 0 0 0 0 0 0 0
18 0 0 0 0 0 0 9 73 252 248 88 0 0 84 249 180 0 0 0 0 0 0 0 0 0 0 0 0
19 0 0 0 0 0 0 44 254 246 50 0 0 25 210 247 86 0 0 0 0 0 0 0 0 0 0 0 0
20 0 0 0 0 0 25 213 254 84 0 2 82 223 254 203 0 0 0 0 0 0 0 0 0 0 0 0 0
21 0 0 0 0 0 63 254 254 4 33 168 254 254 176 5 0 0 0 0 0 0 0 0 0 0 0 0 0
22 0 0 0 0 0 31 225 254 227 240 254 235 133 9 0 0 0 0 0 0 0 0 0 0 0 0 0 0
23 0 0 0 0 0 0 42 193 254 254 222 29 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
24 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
25 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
26 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
27 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0

Interesting... Perhaps I can "show this array" somehow!¶

In [81]:
import matplotlib.pyplot as plt

# Color maps: https://matplotlib.org/stable/tutorials/colors/colormaps.html
plt.imshow(simple_image, cmap='gray')
plt.show();

Rasterio¶

Rasterio: access to geospatial raster data

Geographic information systems use GeoTIFF and other formats to organize and store gridded raster datasets such as satellite imagery and terrain models. Rasterio reads and writes these formats and provides a Python API based on Numpy N-dimensional arrays and GeoJSON.

Source: https://rasterio.readthedocs.io/en/latest/index.html

In [82]:
import rasterio
cog = 'https://nex-gddp-cmip6-cog.s3-us-west-2.amazonaws.com/monthly/CMIP6_ensemble_median/tas/tas_month_ensemble-median_ssp585_205007.tif'
dataset = rasterio.open(cog)
In [83]:
plt.figure(figsize=(8, 8));
plt.imshow(dataset.read(1), cmap='inferno');
plt.title('2050-07 NSAA GDDP Monthly Temperature at Surface');

Let's look at some LANDSAT 8 Data¶

Bands 2, 3, and 4 are visible blue, green, and red. Source: https://landsat.gsfc.nasa.gov/landsat-8/landsat-8-bands/

Much of this is referenced from: https://automating-gis-processes.github.io/CSC/lessons/L5/overview.html

In [84]:
import urllib
import os.path
import rasterio

# https://automating-gis-processes.github.io/CSC/notebooks/L5/reading-raster.html
raster_url = 'https://github.com/Automating-GIS-processes/CSC18/raw/master/data/Helsinki_masked_p188r018_7t20020529_z34__LV-FIN.tif'
helsinki_local_file = 'data/raster/Helsinki_masked_p188r018_7t20020529_z34__LV-FIN.tif'
In [85]:
if not os.path.isfile(helsinki_local_file):
    urllib.request.urlretrieve(raster_url, helsinki_local_file)
else:
    print("Data already in disk")
    
dataset = rasterio.open(helsinki_local_file)
dataset
Data already in disk
Out[85]:
<open DatasetReader name='data/raster/Helsinki_masked_p188r018_7t20020529_z34__LV-FIN.tif' mode='r'>
In [86]:
dataset.count # number of raster bands
Out[86]:
7
In [87]:
from matplotlib import pyplot
from rasterio.plot import show

fig, (axr, axg, axb) = pyplot.subplots(1,3, figsize=(21,7))
show((dataset, 3), ax=axr, cmap='Reds', title='red channel')
show((dataset, 2), ax=axg, cmap='Greens', title='green channel')
show((dataset, 1), ax=axb, cmap='Blues', title='blue channel')
pyplot.show()
In [88]:
from rasterio.plot import show_hist

dataset_rgb = dataset.read([3,2,1])

fig, (axrgb, axhist) = pyplot.subplots(1, 2, figsize=(14,7));
show(dataset_rgb,ax=axrgb)
show_hist(dataset_rgb, ax=axhist, bins=50, lw=0.0, stacked=False, alpha=0.3,
    histtype='stepfilled', title="Histogram of Bands")
axhist.legend(['1', '2', '3']);
In [89]:
dataset.bounds
Out[89]:
BoundingBox(left=698592.0, bottom=6656859.0, right=735300.0, top=6697870.5)

Raster Operations¶

Raster Math¶

raster-math

Source: https://desktop.arcgis.com/en/arcmap/10.7/tools/spatial-analyst-toolbox/cell-statistics.htm

Zonal Statistics¶

img

Source: https://gisgeography.com/zonal-statistics/

Zonal statistics summarizes the cells for a given region or line.

In [90]:
from shapely.geometry import Polygon
sample_coordinates = Polygon([
    (716946.0, 6677364.75),
    (713046.0, 6676164.75),
    (714846.0, 6666164.75),
    (720846.0, 6668164.75),
])
sample_coordinates
Out[90]:
In [91]:
from shapely.geometry import mapping, Polygon
import fiona

poly = Polygon(sample_coordinates)

schema = {'geometry': 'Polygon'}

with fiona.open('data/sample_coordinates.shp', 'w', 'ESRI Shapefile', schema) as c:
    c.write({'geometry': mapping(poly)})
    
# save a shapefile with fiona
In [92]:
import geopandas as gpd
from matplotlib import pyplot as plt
from rasterio.plot import show
fig, ax = plt.subplots(figsize=(8, 8))
p = gpd.GeoSeries(poly)
p.plot(ax=ax, facecolor='none', edgecolor='red')

rasterio.plot.show(dataset, ax=ax)
plt.show()
In [93]:
from rasterstats import zonal_stats

Band 1¶

In [94]:
stats = zonal_stats('data/sample_coordinates.shp', helsinki_local_file, band=1, stats=['min', 'max', 'median', 'majority', 'sum'])
stats
Out[94]:
[{'min': 27.0,
  'max': 252.0,
  'sum': 4118148.0,
  'median': 61.0,
  'majority': 60.0}]

Band 6¶

In [95]:
stats = zonal_stats('data/sample_coordinates.shp', helsinki_local_file, band=6, stats=['min', 'max', 'median', 'majority', 'sum'])
stats
Out[95]:
[{'min': 103.0,
  'max': 160.0,
  'sum': 6997422.0,
  'median': 106.0,
  'majority': 105.0}]

🔵 Geocoding¶

Geocoding is the process of taking input text, such as an address or the name of a place, and returning a latitude/longitude location on the Earth's surface for that place.[1] Reverse geocoding, on the other hand, converts geographic coordinates to a description of a location, usually the name of a place or an addressable location. Source: https://en.wikipedia.org/wiki/Geocoding

geocoding Source: Source: SCHOOL ANALYST via Geospatial World

Geocoding relies on a computer representation of address points, the street / road network, together with postal and administrative boundaries.

  • Geocode (verb):[2] provide geographical coordinates corresponding to (a location).
  • Geocode (noun): is a code that represents a geographic entity (location or object). Sometimes the term can be used in a broader sense:[3] the characterization of a neighborhood, locality, etc., according to such demographic features as ethnic composition or the average income or educational level of its inhabitants, especially as used in marketing.
  • Geocoder (noun): a piece of software or a (web) service that implements a geocoding process i.e. a set of interrelated components in the form of operations, algorithms, and data sources that work together to produce a spatial representation for descriptive locational references.

Geocode Example Google Maps¶

geocode_example Source: Fiverr

The geographic coordinates representing locations often vary greatly in positional accuracy. Examples include building centroids, land parcel centroids, interpolated locations based on thoroughfare ranges, street segments centroids, postal code centroids (e.g. ZIP codes, CEDEX), and Administrative division Centroids. Source: https://en.wikipedia.org/wiki/Geocoding

Geocoding with GeoPy¶

Installing Geopy¶

$ pip install geopy

About GeoPy¶

geopy is a Python 2 and 3 client for several popular geocoding web services. geopy makes it easy for Python developers to locate the coordinates of addresses, cities, countries, and landmarks across the globe using third-party geocoders and other data sources. Source: https://geopy.readthedocs.io/en/stable/

GeoPy Geocoder Services¶

  • ArcGIS
  • AzureMaps
  • Baidu
  • BANFrance
  • Bing
  • DataBC
  • GeocodeEarth
  • GeocodeFarm
  • Geolake
  • GeoNames
  • GoogleV3
  • HERE
  • IGNFrance
  • MapBox
  • OpenCage
  • OpenMapQuest
  • Nominatim
  • Pelias
  • Photon
  • PickPoint
  • LiveAddress
  • TomTom
  • What3Words
  • Yandex
In [96]:
import uuid
from geopy.geocoders import Nominatim

Create a new variable called geolocator as our object of the Nominatim() geocoding class¶

In [97]:
geolocator = Nominatim(user_agent=f"{uuid.uuid4()}")

Let's create a variable called location that geocodes "175 5th Avenue NYC".¶

In [98]:
office_madrid = "Calle del Dr. Castelo, 44, 28009 Madrid"
location = geolocator.geocode(office_madrid)

Let's see this location's standardized geocoded address.¶

In [99]:
print(location.address)
44, Calle del Doctor Castelo, Ibiza, Retiro, Madrid, Área metropolitana de Madrid y Corredor del Henares, Comunidad de Madrid, 28009, España

Do you think this location object has a latitude and longitude???¶

In [100]:
print(location.raw)
{'place_id': 24541438, 'licence': 'Data © OpenStreetMap contributors, ODbL 1.0. https://osm.org/copyright', 'osm_type': 'node', 'osm_id': 2679605101, 'boundingbox': ['40.4201156', '40.4202156', '-3.6739201', '-3.6738201'], 'lat': '40.4201656', 'lon': '-3.6738701', 'display_name': '44, Calle del Doctor Castelo, Ibiza, Retiro, Madrid, Área metropolitana de Madrid y Corredor del Henares, Comunidad de Madrid, 28009, España', 'class': 'place', 'type': 'house', 'importance': 0.7300099999999999}

Let's see what the type() is of location.point¶

In [101]:
type(location.point)
Out[101]:
geopy.point.Point
In [102]:
import warnings
warnings.filterwarnings('ignore')
# See README.md for all libraries and steps to create the environment. 

import pandas as pd
import geopandas as gpd
from shapely import wkt
import matplotlib.pyplot as plt
import rasterio
from rasterstats import zonal_stats
import numpy as np
import osmnx as ox

%matplotlib inline

🔵 OSMnx¶

Python for street networks. Retrieve, model, analyze, and visualize street networks and other spatial data from OpenStreetMap.

https://github.com/gboeing/osmnx

In [103]:
office_madrid_retiro_network = ox.graph_from_address(
    address=office_madrid, dist=1000,   # meters
    dist_type='network', retain_all=True,
    truncate_by_edge=False, simplify=False)
In [104]:
# you can project the network to UTM (zone calculated automatically)
office_madrid_retiro_network_projected = ox.project_graph(office_madrid_retiro_network)
# source: https://github.com/gboeing/osmnx-examples/tree/master/notebooks
graph_gdfs = ox.graph_to_gdfs(office_madrid_retiro_network_projected, edges=True, fill_edge_geometry=True)
retiro_intersections = graph_gdfs[0]
retiro_streets = graph_gdfs[1]
In [105]:
from geopy.geocoders import Nominatim
from shapely.geometry import Point  # importing the Point class
from geopandas import GeoDataFrame

data = {'address': [office_madrid], 'name': ['Office Retiro']}

geolocator = Nominatim(user_agent='sdsc_geospatial')

df = pd.DataFrame(data, columns=data.keys())

df['geocode'] = df['address'].apply(geolocator.geocode)  

df['geometry'] = df['geocode'].apply(
    lambda x: 
        Point(x.longitude, x.latitude)
) # create a geometry column

address = GeoDataFrame(df, geometry='geometry', crs='epsg:4326')
retiro_point = address[address['address'] == office_madrid]
In [106]:
# source: https://github.com/gboeing/osmnx-examples/tree/master/notebooks
fig, ax = plt.subplots(figsize=(6, 6))

retiro_point.to_crs(epsg=25830).buffer(1000).plot(ax=ax).set_axis_off()  # 2 km buffer
retiro_streets.to_crs(epsg=25830).plot(ax=ax, color='red')
retiro_intersections.to_crs(epsg=25830).plot(ax=ax, color='pink')

retiro_point.to_crs(epsg=25830).plot(ax=ax, color='black', markersize=1000, marker='X')  # 2 km buffer


plt.title('Radial vs. Network Buffer')

plt.show();

🔵 Additional Resources¶

  • Class for Pratt SAVI 810 2020-03: Intro to Python Scripting for Geospatial - https://github.com/pratt-savi-810/pratt-savi-810-2020-03
  • Geographic Data Science Lab is a research centre at the University of Liverpool - https://www.liverpool.ac.uk/geographic-data-science/
  • CSC Finland – IT Center for Science - Welcome to Introduction to Python GIS -course 2018 - https://automating-gis-processes.github.io/CSC/index.html
  • Geographic Data Science, a course designed by Dr. Dani Arribas-Bel - https://darribas.org/gds_course/content/home.html